3 research outputs found
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Several high-resource Text to Speech (TTS) systems currently produce natural,
well-established human-like speech. In contrast, low-resource languages,
including Arabic, have very limited TTS systems due to the lack of resources.
We propose a fully unsupervised method for building TTS, including automatic
data selection and pre-training/fine-tuning strategies for TTS training, using
broadcast news as a case study. We show how careful selection of data, yet
smaller amounts, can improve the efficiency of TTS system in generating more
natural speech than a system trained on a bigger dataset. We adopt to propose
different approaches for the: 1) data: we applied automatic annotations using
DNSMOS, automatic vowelization, and automatic speech recognition (ASR) for
fixing transcriptions' errors; 2) model: we used transfer learning from
high-resource language in TTS model and fine-tuned it with one hour broadcast
recording then we used this model to guide a FastSpeech2-based Conformer model
for duration. Our objective evaluation shows 3.9% character error rate (CER),
while the groundtruth has 1.3% CER. As for the subjective evaluation, where 1
is bad and 5 is excellent, our FastSpeech2-based Conformer model achieved a
mean opinion score (MOS) of 4.4 for intelligibility and 4.2 for naturalness,
where many annotators recognized the voice of the broadcaster, which proves the
effectiveness of our proposed unsupervised method
LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model
It has been shown that Large Language Model (LLM) alignments can be
circumvented by appending specially crafted attack suffixes with harmful
queries to elicit harmful responses. To conduct attacks against private target
models whose characterization is unknown, public models can be used as proxies
to fashion the attack, with successful attacks being transferred from public
proxies to private target models. The success rate of attack depends on how
closely the proxy model approximates the private model. We hypothesize that for
attacks to be transferrable, it is sufficient if the proxy can approximate the
target model in the neighborhood of the harmful query. Therefore, in this
paper, we propose \emph{Local Fine-Tuning (LoFT)}, \textit{i.e.}, fine-tuning
proxy models on similar queries that lie in the lexico-semantic neighborhood of
harmful queries to decrease the divergence between the proxy and target models.
First, we demonstrate three approaches to prompt private target models to
obtain similar queries given harmful queries. Next, we obtain data for local
fine-tuning by eliciting responses from target models for the generated similar
queries. Then, we optimize attack suffixes to generate attack prompts and
evaluate the impact of our local fine-tuning on the attack's success rate.
Experiments show that local fine-tuning of proxy models improves attack
transferability and increases attack success rate by , , and
(absolute) on target models ChatGPT, GPT-4, and Claude respectively
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric
foundation and instruction-tuned open generative large language models (LLMs).
The models are based on the GPT-3 decoder-only architecture and are pretrained
on a mixture of Arabic and English texts, including source code in various
programming languages. With 13 billion parameters, they demonstrate better
knowledge and reasoning capabilities in Arabic than any existing open Arabic
and multilingual models by a sizable margin, based on extensive evaluation.
Moreover, the models are competitive in English compared to English-centric
open models of similar size, despite being trained on much less English data.
We provide a detailed description of the training, the tuning, the safety
alignment, and the evaluation of the models. We release two open versions of
the model -- the foundation Jais model, and an instruction-tuned Jais-chat
variant -- with the aim of promoting research on Arabic LLMs. Available at
https://huggingface.co/inception-mbzuai/jais-13b-chatComment: Arabic-centric, foundation model, large-language model, LLM,
generative model, instruction-tuned, Jais, Jais-cha